Load the necessary libraries
library(car) #for regression diagnostics
library(broom) #for tidy output
library(ggfortify) #for model diagnostics
library(sjPlot) #for outputs
library(knitr) #for kable
library(effects) #for partial effects plots
library(ggeffects) #for partial effects plots
library(emmeans) #for estimating marginal means
library(MASS) #for glm.nb
library(MuMIn) #for AICc
library(tidyverse) #for data wrangling
library(nlme)
library(lme4)
library(glmmTMB)
library(broom.mixed)
library(glmmTMB) #for glmmTMB
library(DHARMa) #for residuals and diagnostics
library(performance) #for diagnostic plots
library(see) #for diagnostic plots
theme_set(theme_classic())
To investigate differential metabolic plasticity in barramundi (Lates calcarifer), Norin, Malte, and Clark (2015) exposed juvenile barramundi to various environmental changes (increased temperature, decreased salinity and increased hypoxia) as well as control conditions. Metabolic plasticity was calculated as the percentage difference in standard metabolic rate between the various treatment conditions and the standard metabolic rate under control conditions. They were interested in whether there was a relationship between metabolic plasticity and typical (control) metabolism and how the different treatment conditions impact on this relationship.
A total of 60 barramundi juveniles were subject to each of the three conditions (high temperature, low salinity and hypoxia) in addition to control conditions. Fish mass was also recorded as a covariate as this is known to influence metabolic parameters.
Barramundi
Sampling design
Format of norin.csv data files
| fishid | mass | trial | smr_contr | change |
|---|---|---|---|---|
| 1 | 35.69 | LowSalinity | 5.85 | -31.92 |
| 2 | 33.84 | LowSalinity | 6.53 | 2.52 |
| 3 | 37.78 | LowSalinity | 5.66 | -6.28 |
| .. | .. | .. | .. | .. |
| 1 | 36.80 | HighTemperature | 5.85 | 18.32 |
| 2 | 34.98 | HighTemperature | 6.53 | 19.06 |
| 3 | 38.38 | HighTemperature | 5.66 | 19.03 |
| .. | .. | .. | .. | .. |
| 1 | 45.06 | Hypoxia | 5.85 | -18.61 |
| 2 | 43.51 | Hypoxia | 6.53 | -5.37 |
| 3 | 45.11 | Hypoxia | 5.66 | -13.95 |
| fishid | Categorical listing of the individual fish that are repeatedly sampled |
| mass | Mass (g) of barramundi. Covariate in analysis |
| trial | Categorical listing of the trial (LowSalinity: 10ppt salinity; HighTemperature: 35 degrees; Hypoxia: 45% air-sat. oxygen. |
| smr_contr | Standard metabolic rate (mg/h/39.4 g of fish) under control trial conditions (35 ppt salinity, 29 degrees, normoxia) |
| change | Percentage difference in Standard metabolic rate (mg/h/39.4 g of fish) between Trial conditions and control adjusted for 'regression to the mean'. |
norin <- read_csv('../data/norin.csv', trim_ws=TRUE) %>%
janitor::clean_names() %>%
mutate(fishid = factor(fishid), trial = factor(trial))
glimpse(norin)
## Rows: 180
## Columns: 5
## $ fishid <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
## $ mass <dbl> 35.69, 33.84, 37.78, 26.58, 37.62, 37.68, 30.62, 50.37, 24.…
## $ trial <fct> LowSalinity, LowSalinity, LowSalinity, LowSalinity, LowSali…
## $ smr_contr <dbl> 5.847466, 6.530707, 5.659556, 6.278200, 4.407336, 4.818589,…
## $ change <dbl> -31.919389, 2.520929, -6.284968, -4.346675, -3.071329, -15.…
norin %>%
ggplot(aes(x = trial, y = change)) +
geom_boxplot()
norin %>%
ggplot(aes(x = smr_contr, y = change, shape = trial, color = trial)) +
geom_smooth(method = "lm") +
geom_point()
We can see that smr_contr + trial is similar to an ANCOVA design, as we need to see if the lines are parallel or if they are not, indicating an interaction.
norin %>%
ggplot(aes(x = mass, y = change, shape = trial, color = trial)) +
geom_smooth(method = "lm") +
geom_point()
We can see that mass doesn’t seem to affect change much.
We can add in mass as an offset - doesn’t cost any df, but does soak up the additional variance associated. HOWEVER, it assumes the effect of mass is a 1:1 effect on response. Obviously this is not the case here.
Density is the classic example for using offset. Instead of counts, we can put it in as an offset.
Model formula: \[ y_i \sim{} \mathcal{N}(\mu_i, \sigma^2)\\ \mu_i =\boldsymbol{\beta} \bf{X_i} + \boldsymbol{\gamma} \bf{Z_i} \]
where \(\boldsymbol{\beta}\) and \(\boldsymbol{\gamma}\) are vectors of the fixed and random effects parameters respectively and \(\bf{X}\) is the model matrix representing the overall intercept and effects of temperature and (centered) mean fish size on SDA peak. \(\bf{Z}\) represents a cell means model matrix for the random intercepts associated with individual fish.
For each of the following modelling alternatives, we will:
norin_lmm1 <- glmmTMB(change ~ trial * scale(smr_contr, scale = FALSE) +
offset(mass) + (1|fishid), data = norin,
REML = F)
better_opt <- glmmTMBControl(optimizer = "optim",
optArgs = "Nelder-Mead")
norin_lmm2 <- glmmTMB(change ~ trial * scale(smr_contr, scale = FALSE) +
scale(mass, scale=F) + (1|fishid), data = norin,
REML = F, control = better_opt) # still has problems, even with the optimizer
norin_lmm3 <- glmmTMB(change ~ trial * scale(smr_contr, scale = FALSE) +
(1|fishid), data = norin,
REML = F)
AICc(norin_lmm1, norin_lmm2, norin_lmm3)
We cannot get the model with mass to converge. This may be a signal that this model is simply too complicated for the data available.
We add a third model to see what it looks like without mass included as an offset.
This third model is the best according to AICc, thus we drop mass as an important factor.
Conclusions:
norin_lmm3 <- glmmTMB(change ~ trial * scale(smr_contr, scale = FALSE) +
(1|fishid), data = norin,
REML = T)
norin_lmm4 <- glmmTMB(change ~ trial * scale(smr_contr, scale = FALSE) +
(trial|fishid), data = norin,
REML = T)
AICc(norin_lmm3, norin_lmm4)
The random slope and intercept model is preferred. Thus, allowing each fish to have a different response
We could try the smr_contr as a random slope, but the problem is that each fish has the same base metabolism, so there is no variability within each fish to fit random slopes to!
norin %>%
arrange(fishid) %>% select(fishid, smr_contr) %>% head(10)
Conclusions:
norin_lmm4 %>% simulateResiduals(plot=T)
## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help.
##
## Scaled residual values: 0.14 0.548 0.38 0.476 0.312 0.196 0.288 0.872 0.616 0.644 0.772 0.66 0.54 0.74 0.204 0.288 0.032 0.148 0.02 0.224 ...
Literally near perfect.
Using sjPlot::plot_model
norin_lmm4 %>% plot_model(type = 'eff', show.data = T) %>% plot_grid() # Misleading due to the assumption that there is no interactions
norin_lmm4 %>% plot_model(type = 'eff', terms = c("smr_contr", "trial"),
show.data = T) # put both plots together
Using emmeans:
norin_lmm4 %>% ggemmeans(~smr_contr|trial) %>% plot(add.data=T) # assumes no interaction
Note: that ggemmeans seems to cause errors with the x-axis coordinates of the points! This seems to be an error!
Using REML fit for interpretation.
summary(norin_lmm4)
## Family: gaussian ( identity )
## Formula:
## change ~ trial * scale(smr_contr, scale = FALSE) + (trial | fishid)
## Data: norin
##
## AIC BIC logLik deviance df.resid
## 1594.0 1635.5 -784.0 1568.0 173
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev. Corr
## fishid (Intercept) 333.013 18.249
## trialHypoxia 355.899 18.865 -0.62
## trialLowSalinity 1051.420 32.426 -0.21 -0.05
## Residual 8.452 2.907
## Number of obs: 180, groups: fishid, 60
##
## Dispersion estimate for gaussian family (sigma^2): 8.45
##
## Conditional model:
## Estimate Std. Error z value
## (Intercept) 50.382 2.386 21.119
## trialHypoxia -50.226 2.493 -20.150
## trialLowSalinity -42.223 4.220 -10.006
## scale(smr_contr, scale = FALSE) -21.553 3.821 -5.641
## trialHypoxia:scale(smr_contr, scale = FALSE) 13.511 3.992 3.384
## trialLowSalinity:scale(smr_contr, scale = FALSE) 10.189 6.758 1.508
## Pr(>|z|)
## (Intercept) < 2e-16 ***
## trialHypoxia < 2e-16 ***
## trialLowSalinity < 2e-16 ***
## scale(smr_contr, scale = FALSE) 1.69e-08 ***
## trialHypoxia:scale(smr_contr, scale = FALSE) 0.000714 ***
## trialLowSalinity:scale(smr_contr, scale = FALSE) 0.131648
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Evidence of slope x intercept correlation for high salin vs. hypoxia and high salin (-0.62) vs. low salin (r = -0.21), but not super important for model interpretation.
Very high correlations between variables indicates that bizarre stuff is happening…
cov2cor(vcov(norin_lmm4)$cond)
## (Intercept) trialHypoxia
## (Intercept) 1.000000e+00 -6.203202e-01
## trialHypoxia -6.203202e-01 1.000000e+00
## trialLowSalinity -2.150696e-01 -3.513196e-02
## scale(smr_contr, scale = FALSE) -3.745067e-15 3.350838e-15
## trialHypoxia:scale(smr_contr, scale = FALSE) 1.768497e-15 -1.991780e-15
## trialLowSalinity:scale(smr_contr, scale = FALSE) 1.990724e-15 -1.511568e-15
## trialLowSalinity
## (Intercept) -2.150696e-01
## trialHypoxia -3.513196e-02
## trialLowSalinity 1.000000e+00
## scale(smr_contr, scale = FALSE) -4.128113e-15
## trialHypoxia:scale(smr_contr, scale = FALSE) 2.613165e-15
## trialLowSalinity:scale(smr_contr, scale = FALSE) 8.487862e-16
## scale(smr_contr, scale = FALSE)
## (Intercept) -4.634123e-15
## trialHypoxia 2.549910e-15
## trialLowSalinity 5.469717e-16
## scale(smr_contr, scale = FALSE) 1.000000e+00
## trialHypoxia:scale(smr_contr, scale = FALSE) -6.203202e-01
## trialLowSalinity:scale(smr_contr, scale = FALSE) -2.150696e-01
## trialHypoxia:scale(smr_contr, scale = FALSE)
## (Intercept) 3.524682e-15
## trialHypoxia -2.260970e-15
## trialLowSalinity -6.714704e-16
## scale(smr_contr, scale = FALSE) -6.203202e-01
## trialHypoxia:scale(smr_contr, scale = FALSE) 1.000000e+00
## trialLowSalinity:scale(smr_contr, scale = FALSE) -3.513196e-02
## trialLowSalinity:scale(smr_contr, scale = FALSE)
## (Intercept) -3.082333e-15
## trialHypoxia 1.418294e-15
## trialLowSalinity 2.128408e-15
## scale(smr_contr, scale = FALSE) -2.150696e-01
## trialHypoxia:scale(smr_contr, scale = FALSE) -3.513196e-02
## trialLowSalinity:scale(smr_contr, scale = FALSE) 1.000000e+00
For random effect terms: x+(x|g) = 1+x+(1+x|g) –> Correlated random intercept and slope terms x+(x||g) = 1+x+(1|g)+(0+x|g) –> Uncorrelated random intercept and slope terms
summary(norin_lmm4)
## Family: gaussian ( identity )
## Formula:
## change ~ trial * scale(smr_contr, scale = FALSE) + (trial | fishid)
## Data: norin
##
## AIC BIC logLik deviance df.resid
## 1594.0 1635.5 -784.0 1568.0 173
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev. Corr
## fishid (Intercept) 333.013 18.249
## trialHypoxia 355.899 18.865 -0.62
## trialLowSalinity 1051.420 32.426 -0.21 -0.05
## Residual 8.452 2.907
## Number of obs: 180, groups: fishid, 60
##
## Dispersion estimate for gaussian family (sigma^2): 8.45
##
## Conditional model:
## Estimate Std. Error z value
## (Intercept) 50.382 2.386 21.119
## trialHypoxia -50.226 2.493 -20.150
## trialLowSalinity -42.223 4.220 -10.006
## scale(smr_contr, scale = FALSE) -21.553 3.821 -5.641
## trialHypoxia:scale(smr_contr, scale = FALSE) 13.511 3.992 3.384
## trialLowSalinity:scale(smr_contr, scale = FALSE) 10.189 6.758 1.508
## Pr(>|z|)
## (Intercept) < 2e-16 ***
## trialHypoxia < 2e-16 ***
## trialLowSalinity < 2e-16 ***
## scale(smr_contr, scale = FALSE) 1.69e-08 ***
## trialHypoxia:scale(smr_contr, scale = FALSE) 0.000714 ***
## trialLowSalinity:scale(smr_contr, scale = FALSE) 0.131648
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tidy(norin_lmm4, conf.int=T, effect='fixed')
Significant slope between hypoxia and physiology, not significant between low salinity and physiology, but missing comparison of high salinity and physiology! Use emtrends to do this.
emtrends(norin_lmm4, pairwise ~ trial, var = "smr_contr")
## $emtrends
## trial smr_contr.trend SE df lower.CL upper.CL
## HighTemperature -21.55 3.82 173 -29.1 -14.01
## Hypoxia -8.04 3.41 173 -14.8 -1.32
## LowSalinity -11.36 7.01 173 -25.2 2.48
##
## Confidence level used: 0.95
##
## $contrasts
## contrast estimate SE df t.ratio p.value
## HighTemperature - Hypoxia -13.51 3.99 173 -3.384 0.0025
## HighTemperature - LowSalinity -10.19 6.76 173 -1.508 0.2898
## Hypoxia - LowSalinity 3.32 7.97 173 0.417 0.9087
##
## P value adjustment: tukey method for comparing a family of 3 estimates
Hypoxia vs. high temp significant, everything else is not significant.
norin_lmm4 %>%
emmeans(~ trial) %>% # get differences
regrid() %>% # does backtransform of the differences BEFORE pair-wise comparisons
pairs() %>% # get pair-wise absolute values of differences
confint() # get confidence interval limits for these differences
We wish to compare the pair-wise differences among these trials at specific values of physiology (smr_contr). To do so, we can create a prediction grid, then use emmeans:
norin_grid <- with(norin,
list(smr_contr = c(min(smr_contr),
mean(smr_contr),
max(smr_contr))))
emmeans(norin_lmm1, pairwise ~ trial|smr_contr, at = norin_grid)
## $emmeans
## smr_contr = 3.98:
## trial emmean SE df lower.CL upper.CL
## HighTemperature 75.99 6.93 172 62.32 89.66
## Hypoxia 5.81 6.93 172 -7.86 19.49
## LowSalinity 20.93 6.93 172 7.26 34.61
##
## smr_contr = 5.18:
## trial emmean SE df lower.CL upper.CL
## HighTemperature 52.29 3.21 172 45.96 58.62
## Hypoxia -3.83 3.21 172 -10.16 2.51
## LowSalinity 10.23 3.21 172 3.90 16.57
##
## smr_contr = 6.70:
## trial emmean SE df lower.CL upper.CL
## HighTemperature 22.10 8.45 172 5.41 38.79
## Hypoxia -16.11 8.45 172 -32.79 0.58
## LowSalinity -3.39 8.45 172 -20.08 13.29
##
## Confidence level used: 0.95
##
## $contrasts
## smr_contr = 3.98:
## contrast estimate SE df t.ratio p.value
## HighTemperature - Hypoxia 70.2 8.47 172 8.289 <.0001
## HighTemperature - LowSalinity 55.1 8.47 172 6.503 <.0001
## Hypoxia - LowSalinity -15.1 8.47 172 -1.786 0.1773
##
## smr_contr = 5.18:
## contrast estimate SE df t.ratio p.value
## HighTemperature - Hypoxia 56.1 3.92 172 14.309 <.0001
## HighTemperature - LowSalinity 42.1 3.92 172 10.724 <.0001
## Hypoxia - LowSalinity -14.1 3.92 172 -3.586 0.0013
##
## smr_contr = 6.70:
## contrast estimate SE df t.ratio p.value
## HighTemperature - Hypoxia 38.2 10.33 172 3.698 0.0008
## HighTemperature - LowSalinity 25.5 10.33 172 2.468 0.0386
## Hypoxia - LowSalinity -12.7 10.33 172 -1.231 0.4368
##
## P value adjustment: tukey method for comparing a family of 3 estimates
Shows the differences at each specific level
r.squaredGLMM(norin_lmm4)
## R2m R2c
## [1,] 0.4942093 0.9927262
performance::r2_nakagawa(norin_lmm4)
## # R2 for Mixed Models
##
## Conditional R2: 0.993
## Marginal R2: 0.494
norin_grid <- with(norin,
list(smr_contr = modelr::seq_range(smr_contr, n=100)))
newdata = emmeans(norin_lmm4, ~ smr_contr|trial, at = norin_grid) %>%
as.data.frame() %>%
rename(change = emmean, lwr = lower.CL, upr = upper.CL)
(g1 <- newdata %>%
ggplot(aes(x = smr_contr, y = change)) +
geom_ribbon(aes(ymin = lwr, ymax = upr, fill = trial), alpha = 0.3) +
geom_line(aes(color = trial)) +
theme(legend.position = c(0.99, 0.99),
legend.justification = c(1,1)))
# geom_point(data = norin, aes(col = trial)) # apparently is bad?
However, we can’t add the plot the same way, as the model data values are standardized in a specific way according to the random effect
Thus, we have to add data a different way… Step 1. Get predicted value for each individual via augment Step 2. Add on the x variables lost via augment Step 3. Add on residual value to obtain standardized values
obs <- augment(norin_lmm4) %>%
bind_cols(dplyr::select(norin, smr_contr)) %>%
mutate(partial_obs = .fitted + .resid)
g1 +
# geom_point(data = norin, aes(col = trial)) # looks the same, but in models where the other variables not being plotted explain a lot of the variation, will be drastically different!
geom_point(data = obs, aes(y = partial_obs, color = trial))